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Abstract 

Light harvesting complexes 2 (LH2) from Rhodo spirillum (Rs.) molischianum and 
Rhodopseudomonas (Rps.) acidophila form ring complexes out of eight or nine iden- 
tical subunits, respectively. Here, we investigate computationally what factors gov- 
ern the different ring sizes. Starting from the crystal structure geometries, we embed 
two subunits of each species into their native lipid-bilayer /water environment. Using 
molecular dynamics simulations with umbrella sampling and steered molecular dy- 
namics, we probe the free energy profiles along two reaction coordinates, the angle 
and the distance between two subunits. We find that two subunits prefer to arrange 
at distinctly different angles, depending on the species, at about 42.5° for Rs. molis- 
chianum and at about 38.5° for Rps. acidophila, which is likely to be an important 
factor contributing to the assembly into different ring sizes. Our calculations sug- 
gest a key role of surface contacts within the transmembrane domain in constraining 
these angles, whereas the strongest interactions stabilizing the subunit dimers are 
found in the C-, and to a lesser extent, N-terminal domains. The presented com- 
putational approach provides a promising starting point to investigate the factors 
contributing to the assembly of protein complexes, in particular if combined with 
modeling of genetic variants. 
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1 Introduction 



In photosynthesis, assemblies of pigment-protein complexes absorb sunlight 
and convert its energy into a biochemical potential. In recent years, tremen- 
dous progress has been made towards identifying the in vivo structure of the 
photosynthetic apparatus, in particular that of purple bacteria [1-4]. Calcu- 
lations and spectroscopic measurements reveal that a close relationship exists 
between the efficiency of light harvesting and the geometrical arrangements of 
pigment-protein complexes [5-8]. 

This raises the question as to how nature governs the assembly of its photosyn- 
thetic apparatus within its membrane. The antenna light-harvesting complex 
2 (LH2) of purple bacteria can be considered a paradigmatic model system 
to address this question, because (i) atomic-resolution crystal structures exist 
for LH2s with different organizations and (ii) mutagenesis and reconstitution 
assays allow for direct experimental studies of key factors in the assembly of 
LH2 complexes. 

LH2s are composed of repetitions of a basic unit of two transmembrane polypep- 
tides, a and j3. Each af3 heterodimeric subunit non-covalently binds three 
bacteriochlorophylls (BChls) and one carotenoid. Figure 1 shows the basic sub- 
unit of LH2 from Rhodo spirillum (Rs.) molischianum and Rhodopseudomonas 
(Rps.) acidophila, respectively, as taken from their crystal structures [9,10]. 
Figure 2 shows the corresponding sequences of LH2 a and f3 polypeptides. 
Both a and f3 polypeptides consist of polar N- and C-termini and a central 
hydrophobic region. The N-termini lie on the cytoplasmic side of the mem- 
brane, the C-termini on the periplasmic side. Amino acids in the central hy- 
drophobic region form two transmembrane a-helices. B850 BChls are ligated 
to the highly conserved His residues near the C-terminus. 

Interestingly, the crystallographic structures of LH2 reveal a different organi- 
zation of subunits, a ring of nine ct/3-sub units for Rps. acidophila [9, 11], but of 
eight a/3-subunits for Rs. molischianum [10]. EM and AFM studies reveal non- 
americ organizations for LH2s from Rhodovulum sufidophilum [12], Rhodobac- 
ter (Rb.) sphaeroides [13,14], and Rubrivivax gelatinosus [15,16], whereas a 
low-light variant form of LH2 from Rps. palustris reveals an eight-fold sym- 
metry [17]. In all of the above cases, the octameric or nonameric organization 
appears to be homogeneous within a given species, although chromatographies 
for 3D crystallization of LH2 from Rs. molischianum suggest the possibility of 
unstable LH2 forms with different numbers of subunits (H. Michel, personal 
comm.). Structural heterogeneity within a species has been described only for 
Rs. photometricum, where AFM studies suggest that most LH2s are organized 
in either eight-, nine-, or ten-fold symmetry [18]. 
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Fig. 1. Structure of a subunit of LH2 from Rs. molischianum (left) and from 
Rps. acidophila (right). Each subunit consists of one a polypeptide (blue ribbon), 
one /3-polypeptide (red ribbon), three BChls (green; phytyl chains truncated), and 
one carotenoid (yellow). The N-terminal domains are on top, the C-terminal do- 
mains on bottom. The surface of the subunit is superimposed onto the simplified 
representations 
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Fig. 2. Amino acid sequences of the light-harvesting polypeptides of LH2 from 
Rs. molischianum and Rps. acidophila. Transmembrane helical domains are high- 
lighted in red. 
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Differences in organization have also been reported for the core light-harvesting 
complex 1 (LH1), surrounding the photosynthetic reaction center. LH1, like 
LH2, is composed of repetitions of an af3 heterodimeric subunit; however it 
contains only two instead of three BChls per subunit. LH1 complexes were re- 
ported to be formed of 16 a/5-subunits (Rsp. rubrum [19], Blastochloris viridis 
[20]), 15 tt/3-subunits and one PufX-like peptide Rps. palustris [21], or 12 af3- 
subunits plus one PufX peptide plus a gap (Rb. sphaeroides [3,22]) 

Reconstitution assays [23] show that in many cases light harvesting complexes 
with very similar optical properties to the wild-type complexes can be recon- 
stituted in vitro from their individual components [24, 25]. Truncated versions 
of natural proteins, chemically synthesized de novo proteins, and mutagenetic 
gene products have been studied, revealing residues essential to formation of 
a/?-subunits and full complexes [26-30]. A recent study demonstrated in vivo 
assembly of redesigned peptides from Rb. sphaeroides into fully functional 
light-harvesting complexes [31] 

These results suggest that the building blocks of light harvesting complexes 
can self-assemble to form stable, functional complexes. Thus, one should be 
able to relate the observed differences in complex organization to the structure 
of their subunits. What features of the subunits govern the organization of 
the complete ring complexes, in particular the oligomerization state, i.e., the 
number of subunits employed in forming a ring? 

In the present manuscript, we investigate in how far the variations in oligomer- 
ization states can be explained by changes in the local interaction angle be- 
tween neighboring subunits. In theory, one expects that subunits with a pre- 
ferred interaction angle of 45° would assemble into a ring of eight subunits 
(8 x 45° = 360°), whereas subunits with a preferred interaction angle of 40° 
should form rings of nine subunits. It appears a remarkable feat for subunits 
of two helical proteins to control a difference in angle as small as 5° in the 
presence of conformational fluctuations and solvent dynamics. 

Starting from the crystal structures of LH2 from Rs. molischianum and Rps. aci- 
dophil^ henceforth referred to as MOLI and ACI, respectively, we embed two 
a/3-subunits from either species into their native lipid- water environment. We 
use molecular dynamics with umbrella sampling and steered molecular dy- 
namics to probe the free energy surfaces along various reaction coordinates, 
namely changes in the angle and distance between the subunits. The calcu- 
lation techniques, reviewed in the following section, represent, arguably, the 
most accurate calculations possible on a system of this scale. They provide a 
reference point against which results from simpler models can be compared. 
The calculations reveal significant differences between subunits from MOLI 
and ACI that we discuss with view of the consequences for protein complex 
assembly. We discuss future studies necessary to capture the essential proper- 
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ties governing the oligomerization state of light harvesting complexes. 



2 Theoretical and Computational Model 

As a first step in understanding how LH2 subunits assemble into precise ring 
structures, we have focused on determining and characterizing the interaction 
between two LH2 subunits. We started from a geometry characteristic to a full 
aggregate. Atomic models for two LH2 subunit dimers, namely for M OLI and 
ACI, can be readily built starting from their high resolution crystal structures 
available from the Protein Data Bank (entry codes 1LGH [32] and 1KZU [33], 
respectively). In order to mimic their native environment, we have embedded 
a pair of subunits for both MOLI and ACI into properly solvated lipid bilayers. 
The dynamical behavior of the systems were investigated by means of all atom 
molecular dynamics (MD) simulations. Details about the built systems and 
the MD protocols used are presented at the end of this section. 

Since, presently, MD simulations studies are restricted to the 10-100 nanosec- 
ond time scale, they cannot be applied directly to describe the complete as- 
sembly process between two LH2 subunits. Indeed, lateral diffusion of these 
protein units in the lipid membrane occurs on a much longer time scale than 
the one accessible by MD simulations. However, suitably designed MD simula- 
tions, combined with statistical mechanical analysis, can be used to determine 
the free energy profile or potential of mean force (PMF) [34] between the inter- 
acting subunits. The PMF then can be used as input in a suitable stochastic 
model (e.g., Fokker-Planck or Smoluchowski equation [35]) for describing the 
dynamics of the system at a mesoscopic or even macroscopic scale. To this 
end, one first needs to identify a small number of variables (e.g., distances 
and/or angles), hereafter referred to as reaction coordinates that describe the 
relative separation and orientation of the subunits. Then, one should use a 
properly designed set of equilibrium or non-equilibrium MD simulations to 
calculate the PMF (i.e, free energy) U (Q) of the system as a function of the 
reaction coordinates Q. Once the PMF is determined, the generalized force 
exerted between the subunits is equal to F — —dU(Q)/dQ. 

In what follows, we define reaction coordinates suitable for describing the 
self-assembly of LH2 subunits. Then we briefly describe two well established 
methods for calculating PMFs from MD simulations, namely (i) umbrella sam- 
pling [36, 37] combined with the weighted histogram analysis method (WHAM) 
[38], and (ii) steered molecular dynamics (SMD) [39,40] in conjunction with 
the application of the Jarzynski equality [41]. Method (i) uses equilibrium MD 
simulations, while (ii) relies on non-equilibrium simulations [34]. 
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2. 1 Reaction Coordinates R and 9 



The self-assembly of LH2 rings can be envisioned as a process in which pre- 
formed LH2 subunits are first inserted into the membrane, then brought within 
contact distance through diffusion in the lipid membrane, followed by final 
locking into the ring specific geometry. In a first approximation, this process 
can be modeled as a purely two dimensional one in which each subunit has a 
specific anisotropic 2D structure (e.g., a deformed disk) that at the end of the 
process forms an N-fold symmetric ring (N=8 for MOLI and N=9 for ACI). 

We define two reaction coordinates to characterize the spatial interaction be- 
tween LH2 subunits in their native membrane environment, namely R - the 
separation between them, and - their relative angular orientation. A more 
precise definition of reaction coordinates is illustrated in Fig. 3, displaying 
two MOLI subunits (from the equilibrated system) in a side view (a) and in a 
top view (b from the N-terminal or cytoplasmic side. Fig. 3c shows only the 
transmembrane helical domains of the two subunits (A±, B\) and (A 2 , B 2 ), 
showing a clear separation of a and (3 polypeptides. 

We define the center-of-mass (COM) of the a ((3) apoproteins in subunit i = 
1, 2 as Ai (B,j). Then 9 by definition is the angle made by the projections of the 
vectors Vi = AiBi, i = 1,2, on the xy-p\&ne of the membrane. The separation 
distance reaction coordinate is defined as the distance between the COMs 
of the a/5-apoprotein heterodimer within the xy-plane, i.e., R = |Ri — R2I, 
where Ri, i = 1, 2 are the projections of the position vectors of these COMs on 
the xy-pl&ne. Note that, at any instant of time, the reaction coordinates are 
determined (through the COMs) by the coordinates and masses of a selected 
group of atoms from both subunits. 

In selecting the group of atoms that define the reaction coordinates one has 
to make sure that under equilibrium (or stationary) conditions, the reaction 
coordinates have well defined mean values and sufficiently small fluctuations. 
Otherwise, the reaction coordinates are ill defined and cannot be used to 
characterize the system quantitatively. For example, we have found that (see 
Sec. 3) R and 9 are well defined if one considers only the heavy atoms in the 
transmembrane helical (TMH) domains of the a/9-apoproteins. Adding to the 
selection all the heavy atoms in the N- and C-terminal domains, would lead 
to substantial increase in the fluctuations of 9 rendering the latter reaction 
coordinate meaningless. Finally, in what follows, we use the convention that 
R and 9 refer to particular target values of the reaction coordinates, while 
R = R(q) and 9 = 9(q) refer to the instantaneous values of the reaction 
coordinates determined from the corresponding positions q = cjj(t) of all 
defining atoms. 
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Fig. 3. Definition of the reaction coordinates 6 and R illustrated for two subunits 
of LH2 from Rs. molischianum. (a) Side view with the N-termini on top. (b) Top 
view from the N-terminal or cytoplasmic side in space-filling representation with 
transmembrane helices highlighted in ribbon representation, (c) Top view from from 
the N-terminal or cytoplasmic side of only the a and (5 polypeptides with definition 
of reaction coordinates. 

2.2 Potential of Mean Force U(0, R) 



The PMF U(Q) = U(9, R) is the free energy of the system formed by the two 
interacting subunits for well defined spatial R and angular 9 separations; for 
brevity we have introduced the compact notation Q = {9, R}. In principle, the 
PMF can be calculated by integrating out all the degrees of freedom except 
for the reaction coordinates [34], i.e., 

e-W) = po(Q) = / dT e -^—5lQ - Q(T)\ , (1) 
j z 



where (3 = is the temperature factor {Jzb is the Boltzmann constant 

and T is the absolute temperature), po(Q) is the equilibrium distribution func- 
tion of the reaction coordinates, Hq(T) is the Hamiltonian of the system as 
a function of V = {q, p} (i.e., all the atomic coordinates and momenta), Z 
is the partition function and 5(x) is the Dirac-delta function whose filtering 
property guarantees that the integrand in Eq. (1) is nonzero only when the 
reaction coordinates have the requested value, i.e, when Q(T) = Q. In princi- 
ple, the equilibrium distribution function po(Q) can be easily computed from 
MD simulations, since it is proportional to the binned histogram of the reac- 
tion coordinates calculated along the MD trajectory. Thus, the PMF is readily 
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given by 

U(Q) = -k B T\og Po (Q) 



(2) 



However, in practice, even the longest equilibrium MD trajectories will sample 
only a restricted region of the reaction coordinate domain (i.e., within the 
vicinity of the PMF minimum) of interest and the direct application of Eq. (1) 
is impractical. 

There exist two basic methods for calculating PMFs from MD simulations 
widely known as (1) umbrella sampling [36], and (2) steered molecular dynam- 
ics [42,43]. Next we present briefly both methods within a unifying conceptual 
framework, and point out under what conditions one should opt for using one 
or the other. In both methods a crucial step is to alter the dynamics of the 
system by applying a suitable guiding potential. 



2.3 Harmonic Guiding Potential 



During equilibrium MD simulations the system explores only a small region 
of the phase space V about the minimum of the sought PMF U (Q) . In order 
to properly sample energetically more difficult to reach regions, one needs to 
guide or steer our system towards those regions by employing, e.g., a harmonic 
guiding potential (HGP) 

V Q {Q)^V{Q{Y)\Q) = k -^[Q{T)-Q}\ (3) 



where fcg is the stiffness of the HGP. With this extra potential energy, the 
Hamiltonian of the new, biased system becomes 

H Q = H + V Q (Q), (4) 



and, as a result, the atoms in the selections that define the reaction coordinates 
will experience extra forces 

Fi = _^_ M0(r) _ Q] Mff) (5) 

Thus, the HGP (3) will force the system to evolve in the phase space in such 
a way that during its time evolution Q will stay confined in the vicinity of the 
target Q value. 
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2.4 Method I: Umbrella sampling and WHAM 



In umbrella sampling, one divides the reaction coordinate interval of interest 
(Qmin, Qmax) that one wants to sample in N w windows by conveniently chosen 
values Qi, i = 1,...,N W . Next, the reaction coordinate is sampled in each 
window separately by preparing identical replicas of the system and applying 
the harmonic guiding potential Vq^Q). As a result, the biased distribution 
functions can be readily obtained by direct sampling of the reaction coordinate 
for the biased system [34,36,38], i.e, 

Pi(Q) = / dT e -—5[Q - Q{V)\ = ^e-^)po(Q) , (6) 



where, for brevity, the index Qi has been replaced by i. The equilibrium dis- 
tribution in each window is related to the biased distribution of the reaction 
coordinate through 

po(Q) = f^ Q WQ) • (7) 

^0 



The standard method to efficiently stitch together the biased Pi(Q)'s in order 
to obtain the equilibrium p^, and therefore the sought PMF (2), is the so called 
weighted histogram analysis method or WHAM. According to this method, 
Po(Q) is expressed as a weighted sum over the biased distributions Pi{Q) as 
follows 

MQ) = ^ Nw - e -^ (0 ) (8) 



where 

( e -W) = f dQ Po (Q)e-^ . (9) 



The above non-linear coupled WHAM equations, that need to be solved it- 
eratively, minimize the errors in determining po(Q), and therefore the PMF 
U(Q). This method will give good PMFs only if the overlap between windows 
is substantial. Whenever a reasonable number of well overlapping windows 
can be constructed, the umbrella sampling method with WHAM should be 
the top choice for calculating PMFs. In general, this method works very well 
for determining both ID and 2D PMFs. We have used this method to calcu- 
late the 9 dependent PMF for a constrained separation distance R between 
LH2 subunits as reported in Sec. 3. 
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2. 5 Method II: Steered Molecular Dynamics and Jarzynski Equality 

In many cases, it is desirable to vary in time the target value of the RC 
according to a prescribed rule. For example, we can guide or steer our system 
in the direction of increasing the separation distance R between the two LH2 
subunits during a given simulation time r from an initial value Ri to a final 
one Rf by setting in the HGP (3) 

Q(t) = R(t)=Ri + v R t, v R = {R f - R t )/r . (10) 

By choosing two or more reaction coordinates, one can easily devise compli- 
cated reaction coordinate paths along which the system can be steered using 
a moving HGP. With such HGP the Hamiltonian of the system becomes time 
dependent, and it can be expressed by inserting Eq. (10) into Eqs. (4) and 
(3). Then, the extra forces that need to be applied to the atoms defining the 
reaction coordinates can be calculated with the same formula (5). Clearly, 
the corresponding SMD simulations are non-equilibrium. Already a few SMD 
pulling simulations may help us gain some qualitative insight into the behav- 
ior of the system as it evolves along the reaction coordinate. We apply this 
method to investigate the correlation between 9 and R in LH2 subunit dimers 
when one of the reaction coordinates is either increased or decreased in time. 
The reconstruction of the PMF U (Q) from such non-equilibrium simulations 
requires a sufficiently large number of SMD trajectories. These trajectories 
then need to be analyzed by employing the Jarzynski's equality that connects 
the free energy differences of two equilibrium states with the exponential av- 
erage of the irreversible mechanical work done in non-equilibrium processes 
that connect the equilibrium states in question [41-44], i.e., 

e -P(F Q -F ) = ( e -PW Q ) ; (11) 

where the irreversible work along a path that connects states with given RC 
values Qo and Q is given by 




Here, (. . .) denotes the average over an ensemble of trajectories. 

It should be noted that for very large switching times r (adiabatic approxima- 
tion), the work becomes reversible and we recover the expected equilibrium 
result W-t-kjo = Fq — F . On the other hand, for an instantaneous switching 
time W T ^ = H Q [T(0)] - H \T(0)] = AH, and the Jarzynski equality (11) 
yields again the expected result, i.e., exp(— (3AF) = (exp(/3Aif)). 
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In general, the PMF calculation based on the application of the Jarzynski 
equality to trajectories from SMD simulations is preferred to the umbrella 
sampling method whenever the fluctuations of the reaction coordinate are 
small and a huge number of sampling windows would be required for assuring 
proper overlap of the reaction coordinate distribution histograms. However, it 
is not totally clear how many SMD trajectories one should use in such a case 
for calculating the PMF with sufficient accuracy. Since for our LH2 subunit 
dimer system the fluctuations in R are rather small, i.e., ~ 0.2 A, and a 
complete detachment of the subunits requires an increase in R of 20 - 30 A, 
the present method would be more suitable to calculate U(R\9) than the 
umbrella sampling method. 



2.6 System modeling and MD simulations 

Here we provide a brief description of the modeling of our MOLI and ACI 
dimers, and of the MD simulations performed. 

ACI dimer - two adjacent complete LH2 subunits (protein and cofactors) were 
extracted from the PDB structure 1KZU [33]. After adding the missing hy- 
drogens, the protein complex was inserted in a pre-equilibrated and hydrated 
POPC lipid bilayer. Finally, the system was solvated by adding extra water 
layers to the two sides of the lipid bilayer. Besides the proteins and cofactors 
the system contained 4014 water molecules and 169 POPC lipid molecules, 
with a total system size of 38,594 atoms. The +4e charge of the system was 
neutralized by properly adding 4 CI - counter ions. The system was built by 
using XPLOR [45] and VMD [46]. 

MOLI dimer - using VMD [46] and its plugins, two complete neighboring 
units extracted from a fully equilibrated 8-fold LH2 MOLI ring, reported in 
one of our previous MD studies [47] were inserted in a pre-equilibrated and 
hydrated POPE lipid bilayer, and then solvated by adding two pre-equilibrated 
8 A thick water layers to each side of the membrane. The +4e charge of the 
system was neutralized by properly adding 4 CI - counter ions. In addition 
to the protein and cofactors, the final system contained 8299 water molecules 
and 128 POPE lipids, with a total system size of 44,997 atoms. 

Equilibrium MD simulations - After proper energy minimization, the ACI 
(MOLI) system was equilibrated at 300 K (310 K), normal atmospheric pres- 
sure through a 4 ns (5 ns) long MD simulation in the NPT ensemble. Peri- 
odic boundary conditions and full electrostatics via the Particle Mash Ewald 
method were used. All MD simulations were carried out with the NAMD [48] 
program using the CHARMM 27 parameter set [49]. The simulations were 
carried out on local Linux Beowulf clusters, and the required time for 1 ns 
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simulation running on 30 CPUs was about 1.5 days. 

SMD simulations - A total of 20 SMD simulations were carried out for each 
system, starting from the same state that coincided with the last frame of 
equilibrium MD run. The details of the applied harmonic guiding potential 
for each of the runs are described in Sec. 3.2. 

Umbrella sampling simulations - To determine the PMF U(9\R) for R = Rq = 
18 A and R = R x = 25 A, umbrella sampling MD runs were carried out in 
a a number of windows, starting from 9 = 33° to 9 = 53°, that resulted 
in well overlapping histograms. For ACI (MOLI) at Rq the windows were 
centered around the following 9i angels (measured in degrees): 33, 35, 37, 38, 
39, 40, 41, 42, 43, 45, 47, 49, 51 and 53 (35, 36, 37, 38, 39, 40, 41, 42, 43, 
45, 47, 49, 51 and 53). For R x the choice of window positions were similar. 
The target angle was enforced by applying a 2D harmonic guiding potential 
Vij(9,R) = k i (9-9, l ) 2 /2 + k R (R-R j ) 2 /2 with j = 0,x, k R = 80 kcal/mol A 2 , 
and hi tuned between 8 — lOx 10 3 kcal/mol rad 2 for achieving optimal sampling 
in each window. After exhaustive testings, simulations were performed for 
0.7 ns for each window. Only the last 0.5 ns parts of the trajectories were used 
to construct the histograms. Test runs confirmed that complete sampling in 
each window is achieved on this time scale. The hi, 9>i and 9 histograms have 
been used in the WHAM equations to calculate the PMF U{9\R) as described 
in Sec. 3.3. 



3 Results 

3.1 Equilibrium MD simulations of MOLI and ACI dimers 

In order to test their reliability and usefulness, we have monitored the time 
evolution of the reaction coordinates R and 9, during 5 ns long equilibrium MD 
simulations of two LH2 subunit dimers, one from MOLI (Rs. molischianum) 
and one for ACI (Rps. acidophila). The simulated systems were prepared as 
described in Sec. 2. The calculated values of the reaction coordinates, i.e., R - 
the separation, within the plane of the membrane, between the center-of-mass 
(COM) of the a/5-polypeptide pairs in the two subunits, and 9 - the angle 
between the projection on the membrane plane of the lines that connect the 
COM of the a- and /3-polypeptides for each subunit (cf. Fig. 3), depends on the 
selection of atoms used to determine the COMs. Through extensive testing, 
we found that the most stable reaction coordinates correspond to the situation 
in which only the heavy atoms of the trans-membrane helical (TMH) domain 
of the a/9-apoproteins are considered. In this case, the reaction coordinate 
assumes well defined mean equilibrium values. The distribution histograms of 
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the values of the reaction coordinates for the last 2 ns of the MD trajectories 
are shown in Fig. 4. The peak positions in the histograms (corresponding 
to the most probable values of the reaction coordinates) in Fig. 4 are R ~ 
18 ± 0.08 A, 9 « 42.4° ± 1.7° for MOLI, and R « 17.8 ± 0.15 A, 9 « 
38.1° ± 1.4° for ACL For comparison, Fig. 4 also includes the corresponding 
histograms for a complete Rs. molischianum LH2 ring, obtained from the last 
3 ns part of a 5 ns long equilibrium MD run, and with R « 18.2 ± 0.2 A and 
#o = 45 ± 2.1°. According to these results, we find that for both MOLI and 
ACI subunit dimers, 9 is about 2-3 ° smaller than the compared theoretically 
expected value (i.e., 45° for MOLI and 40° for ACI). However, there is a 
significant difference in the most probable angle of about 4°, close to the 
theoretically expected difference of 5°. For the MOLI LH2 ring 9q coincides 
with the expected 45°, albeit the fluctuations in the angle are noticeably larger 
than for the subunit dimers. 

Furthermore, fluctuations in R are much smaller than fluctuations in 9, and 
the mean value R is essentially the same (within less than half of one A) 
for all three systems. If one tries to extend the protein atoms selection in the 
definition of the reaction coordinates, e.g., by including the CT (C-terminus) 
and NT (N-terminus) domains of the a/3-polypeptides, the resulting reaction 
coordinates become ill defined especially due to the sharp increase in the 
magnitude of the fluctuations. In such cases, we find that during test MD 
simulations (not shown) the fluctuations both in 9 and R increase by more 
than a factor of 2. These findings are consistent with the knowledge that in 
general the TMH regions of membrane proteins are more rigid than the outer 
membrane parts. 




e [deg] R [A] 



Fig. 4. Histograms of the reaction coordinates (left) and R (right) corresponding 
to equilibrium MD simulations as follows: MOLI dimer, 2 ns run (solid curve), ACI 
dimer, 2 ns run (dashed curve), and MOLI LH2 ring, 3 ns (long-dashed curve). 
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3.2 Forced detachment of the subunits using SMD 

Next we employ the reaction coordinates introduced above to investigate the 
relationship between the spatial separation and relative orientation of the two 
LH2 subunits as a result of their forced detachment. In principle, such a study 
requires the knowledge of the 2D potential of mean force (PMF) U (R, 9) , that 
describes the statistical mechanical state of the system as a function of the two 
reaction coordinates. However, a brute force determination of the PMF (by 
direct application of either methods of calculating PMFs described in Sec. 2) 
is computationally prohibitively expensive. 




R[A] 

Fig. 5. Two dimensional density plot of the volume overlap of the transmembrane 
protein regions of MOLI (left) and ACI (right) dimers along the reaction coordinates 
9 and R recorded in the SMD simulations described in the text; the trajectories 
corresponding to the four different sets of simulations are indicated by numbered 
arrows. The volumes are relative to the value corresponding to the equilibrium 
reaction coordinates 9q and R$ (see text), marked by the small circle. 

Thus, to gain some insight into the mechanism that governs the relationship 
between the relative distance and orientation of the LH2 subunit dimer dur- 
ing the forced separation or compression of the subunits, we have conducted 
four sets of 5 SMD simulations each, starting from initial states that coin- 
cide with the last frames of the equilibrium simulations shown in Fig. 4, and 
characterized by R = 18.2 A and 9 = 43.7° for MOLI and R = 18.1 A 
and #o — 41° for ACI. These starting values are marked by small circles in 
Fig. 5. In the first (second) set of simulations 9 was unconstrained while R 
was increased (decreased) with a constant rate of v r = 0.1 A/ps, by ap- 
plying a harmonic guiding potential V(R\R) = kn(R — R) 2 /2, as described 
in Sec. 2, with kR = 500 kcal/molA . Similarly, in the third (fourth) set 
of simulations R was unconstrained while 9 was increased (decreased) with 
Vg = 0.1 deg/ps through a harmonic guiding potential V{9\9) = k e {9 — 9) 2 /2 
with kg = 5 x 10 4 kcal/mol rad 2 . In these potentials, R and 9 are the target 
values of the reaction coordinates while R and 9 represent the instantaneous 
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value of the reaction coordinate as determined from the corresponding atomic 
coordinates. The SMD trajectories in the reaction coordinate plane (i.e., the 
loci of points with coordinates {R(t), 9(t)}) are shown in Fig. 5 for both MOLI 
and ACL The different trajectories belonging to different sets of simulations 
are indicated by numbered arrows. Trajectory points are color-coded according 
to the current volume overlap of the TM parts of the two subunits, normalized 
to the corresponding initial state value. 

The reaction coordinate trajectories exhibit distinctive features for each set of 
simulations, with manifest differences between the two systems. 

As soon as the separation R between the two subunits is increased (set 1) past 
Rq, the TMH domains of the ct/3-apoproteins separate for both MOLI and ACI, 
an event which is accompanied by a noticeable increase in the fluctuations of 
the angle reaction coordinate. While in the case of ACI, the separation of the 
N-terminal and TMH domains seem to occur simultaneously as R increases, 
in contrast, for MOLI, the N-terminal domains of the subunits do not detach 
until R becomes larger than 35 A. On the other hand, the C-terminal domains 
remain connected in both systems even for separations as large as 40 A. This 
difference may also be responsible for the profile of the trajectories in set 1. 
For MOLI, as R increases, the trajectories cluster in three well distinguished 
paths, with occasional partial overlaps and are subject to large 9 fluctuations. 
This suggests that the PMF U(9\R), for a given R > Rq, has a broad global 
minimum with several (at least three) local minimums separated by relatively 
small potential barriers. The location of the minimum is shifted to 9 > 9 
values (see below). On the other hand, for ACI, as R increases, the trajectories 
remain clustered (albeit with enhanced 9 fluctuations) suggesting that the 
PMF U{9\R) has a potential well that is broader than the one for Rq. However, 
the dramatic deviation of one of the trajectories from the rest for R > 35 A 
suggests that for larger separation U(9\R) may develop a structure with at 
least two well separated local minimums, with one equilibrium angle smaller 
and the other one larger than 9 . 

The behavior of the reaction coordinate trajectories for the simulations in sets 
2, 3 and 4 are qualitatively similar for both MOLI and ACI. Already a few A 
compression of R (set 2) increases the overlap of the protein subunits several 
times, accompanied by a decrease of the angle variable. The trajectories nicely 
overlap, indicating that the corresponding PMF U(9\R) is similar to the one 
corresponding to Rq, with the minimum shifted towards a smaller angle than 

Finally, it is remarkable that for the simulations in sets 3 and 4, in which R 
is unconstrained, the latter shows only a slight increase with respect to Rq as 
the angle is increased (decreased) by as much as 20° (10°). Thus, based on 
the results of our SMD simulations, one may conclude that in general forced 
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Table 1 



Average separation distances at which the most important inter-residue links be- 
tween two LH2 subunits of Rs. molischianum are broken in SMD simulations. 
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Table 2 

Average separation distances at which the most important inter-residue links be- 
tween two LH2 subunits of Rps. acidophila are broken in SMD simulations. 

rotation of the relative orientation of the subunits has only very limited ef- 
fect on their spatial separation. On the other hand, the modification of the 
distance between the subunits in general has strong impact on the angle be- 
tween the subunits. This conclusion also provides support to the notion that 
the preferred angle between LH2 subunits is mainly determined by contact 
interactions between subunits. 

Steered molecular dynamics simulations also provide insights into the key 
interactions between the subunits: as two subunits are pulled apart, the links 
between the subunits break from the weakest to the strongest. We consider a 
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link broken, if the distance between the closest contact between two residues 
becomes larger than 3 A. A list of strongest links, as presented in Tables 1 
and 2, provides qualitative information on the basis of SMD simulation data. 
With increased sampling, more refined analysis in terms of e.g. interaction 
energies or free energy barriers would become possible. Inspection of the Tables 
shows that all links in the TMH domain become separated for distances larger 
than R = 25 A for both cases. In the N-terminal domain, all links become 
separated for distances around 25 A for ACI and around 35 A for MOLL 
This is consistent with the overlapping volume data shown in Fig. 5. On the 
other hand, there is one group of links in the C-terminal domain for both ACI 
and MOLI that persists even for separations larger than 40 A. This group 
represents the strongest interactions conferring overall stability of the binding 
of the two subunits. An immediately recognizable difference is that most of the 
links in ACI are between the a-apoproteins while they are between between 
the pi and a2 in MOLI. 



3. 3 Calculation of PMF U(6, R) 



Our focus in the present manuscript is to investigate the interaction angle 
between two subunits. To this end, we have determined two ID PMFs Ui(8\Ri) 
along 9 by using umbrella sampling and WHAM, as described in Sec. 2. 

The calculations were performed for both MOLI and ACI, for two represen- 
tative separations, i.e, the equilibrium _R = 18 A and R x = 25 A. While the 
choice for the equilibrium value is obvious, the reason for the R x selection is 
that at this particular distance MOLI and ACI are in qualitatively different 
separation states. While, for R x , the TMH domains of the a/5-polypeptides 
are already separated in both MOLI and ACI, the N-terminal domain is fully 
separated only in ACI, but not in MOLI; the C-terminal domains are still in 
contact for both systems. 

The computed PMFs, using umbrella sampling and WHAM, are shown in 
Fig. 6. For R , as expected, the PMF for both MOLI and ACI exhibits a 
nearly parabolic lineshape with a minimum that coincides with the position 
of the peak 9q in the corresponding equilibrium angle distribution histogram 
in Fig. 4a 6 P . In fact, the PMFs calculated from those histograms as Uq(6) oc 
— ksT ln[po(#)] matches rather well the bottom of the full PMF obtained from 
umbrella sampling and WHAM (data not shown). At the equilibrium distance, 
the PMF of ACI is narrower than that of MOLI, corresponding to stronger 
angular constraints for ACI compared to MOLI. This is consistent with the 
finding from the SMD simulations that the fluctuations in the angular reaction 
coordinate are smaller for ACI than for MOLI. Compared to the Rq case, for 
R x the PMF for both systems widens up and acquires additional features. 



17 



e [deg] 



Fig. 6. Calculated potentials of mean force U(8\R) for both MOLI (solid lines) 
and ACI (dashed lines) dimers. The thick (thin) curves correspond to R = 18 A 
(R = 25 A). 

For MOLI, the PMF exhibits a small plateau at angles around 40°, and a 
steep downhill region for 9 > 42° that ends in a broad minimum around 49°, 
followed by a modest potential barrier at ~ 53°. These features are consistent 
with the SMD results. Indeed, the angle spread of the SMD trajectories at 
R x extends from ~ 40° to ~ 55°. The steep potential barrier in the PMF at 
< 40° explains the lack of trajectory points below this value. Also, the higher 
trajectory points density along the plateau region and the broad minimum is 
expected. For ACI, the PMF for R x shows a ~ 10° wide, rather flat region 
starting from 39°. The fact that the corresponding SMD trajectory points are 
clustered only in the interval 38° < 9 < 44° does not conflict with the PMF 
data but raises the question why there are no trajectory points up to angles 
of ~ 50°? There are several possible answers. First, the small number of SMD 
trajectories may not provide a proper sampling of the angles for R x . Secondly, 
the 9 self diffusion coefficient may be very small, so that a flat PMF does not 
lead to significant dispersion. The second well in the PMF at 54° suggests 
that eventually a second branch of trajectories may appear oriented towards 
higher angle values as it appears indeed for R > 32 A. 



4 Discussion and Conclusions 

Understanding the assembly of a protein complex requires that one addresses 
a set of interrelated questions: (1) What is the temporal order in which parts 
are put together? (2) What interactions stabilize the protein complex or parts 
of the complex? (3) What factors govern the (reproducible) stable geometry 
of the complex? 

In this manuscript, we focus on the assembly of a two-subunit complex from 
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two fully formed a/^-subunits with three BChl and one carotenoid bound. 
Spectroscopic observations support the assumption that this is an important 
step in the formation of light harvesting complexes as they show a progression 
from monomeric BChl (777 nm) to a BChl dimer bound to an a/3-subunit (820 
nm) to a complex formed of two ct/3-subunits (851 nm) [50,51]. It is, however, 
unknown whether carotenoids are already incorporated at this step and how 
closely the structure of the a/3-subunit at this point matches the structure of 
the a/9-subunit in the crystal structures, from which we took the coordinates. 

An initial analysis of fluctuations in a two-subunit complex equilibrated in a 
lipid-water environment revealed that the hydrophobic core region is rather 
rigid. Fluctuations in the centers-of-mass of the core regions in free molecular 
dynamics runs are small enough to allow for a meaningful definition of global 
coordinates (distance, angle) of the subunits as introduced above. 

In order to probe the assembly process, we performed two sets of calcula- 
tions with different philosophies. In one set of calculations, we constrain the 
subunits at different angles and allow the system to equilibrate under this con- 
straint. From such umbrella sampling, we then extract information about the 
free energy profile as a function of angle. Pulling two subunits apart through 
steered molecular dynamics gives complementary information into key events 
of the binding/unbinding process and a rough order of interaction strengths. 



4-1 Preferred angle 

Our calculations show that two subunits in van-der-Waals contact (center-to- 
center distance of 18 A) arrange at a preferred angle with each other. Figs. 7a, c 
show the transmembrane domains of subunits in the preferred arrangements 
at this distance. For subunits of LH2 from Rs. molischianum, the minimum 
of the free energy profile (PMF; cf. Fig. 6) is located at about 42.5°, whereas 
for Rps. acidophila subunits, the minimum is located at 38.5°. The free en- 
ergy profiles closely match parabolic profiles. Whereas changes of about one 
degree around the minimum position carry only a small energetic penalty, 
it requires about 3 kcal/mol to force two subunits from Rps. acidophila into 
the angle of 42.5° preferred by subunits from Rs. molischianum. Likewise, 
about 2 kcal/mol are required to force two subunits from Rs. molischianum 
into the angle of 38.5° preferred by subunits from Rps. acidophila. This sug- 
gests that the preferred angle between the subunits plays an important role in 
guiding the assembly of light harvesting complexes into a particular ring size 
or oligomerization state. However, the preferred angles, while comparing well 
to the theoretically expected angles of 45° (8-fold symmetry) and 40° (9-fold 
symmetry) are somewhat smaller than these. It is possible that the theoreti- 
cally expected preferred angle is only assumed once each subunit is in contact 
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with a sub unit on either side, as would be the case in a ring geometry. The 
fact that the average interaction angle in a free molecular dynamics simulation 
run of a completed ring is 44.8°, much closer to the expected 45°, supports 
this assumption. 




Fig. 7. Top view from the N-terminus of the transmembrane region of MOLI (a and 
b) and ACI (c and d) dimers. The values of the corresponding reaction coordinate 
R and 6 are indicated for each case. The backbone of the transmembrane helices are 
shown in cartoon representation. The sidegroups of the proteins and the enclosed 
cofactors (BChls and Car) are shown in transparent space-filling representation. 

To see whether the surface contact between the two subunits is important 
in defining a preferred angle, we performed additional calculations where two 
subunits are constrained to a center-to-center distance of 25 A. At this dis- 
tance, subunits of Rs. molischianum are still connected at both their C- and 
N-terminal domains, and subunits of Rps. acidophila are connected at their 
C-terminus only. However, the a(3 polypeptides in the transmembrane region 
are no longer in contact for either subunit pairs (cf. Fig. 7). In this case, the 
free energy decreases at larger angles with considerably more shallow minima 
than the free energy profiles at the equilibrium distance of 18 A. 

These findings suggest that the exact preferred angle between two subunits 
is largely defined by the surface interactions in the transmembrane region. 
Once the a/?-polypeptides in the transmembrane region become disconnected, 
the angular variation increases significantly (see Fig. 5). However, angular 
constraints, albeit more relaxed, still exist even at larger separations of the 
subunits. 
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Interestingly, the angle between two subunits is more constrained in Rps. aci- 
dophila than in Rs. molischianum. The parabola of the free energy profile at 
equilibrium distance (18 A) is narrower for Rps. acidophila than for Rs. molis- 
chianum. Moreover, the five trajectories of Rps. acidophila during the steered 
molecular dynamics simulations show relatively little angular spread after the 
transmembrane regions are disconnected and even after the links in the N- 
terminal domain are broken. Much more angular spread can be observed in 
the case of equivalent steered molecular dynamics runs for Rs. molischianum, 
even for cases when the links in both the C- and N-terminal regions are still 
intact. As stated above, the fast pulling and small sample size during this sim- 
ulation raises a caveat, as the configuration space may not have been sampled 
well. 

If a more complete sampling of the 2D free energy surface confirms the ob- 
served differences in angular constraints, it would suggest that in Rps. aci- 
dophila a motif in the C-terminal domain plays a role in constraining the 
angle between subunits in addition to the hydrophobic surface contacts. One 
can speculate that this double constraint of the angle could be the reason why 
9-rings are much more prevalent among LH2 complexes than rings of other 
sizes. 



4-2 Stabilizing links 

In addition to exploring the factors constraining the angle between two sub- 
units, the main focus of this publication, we chose computationally inexpen- 
sive SMD calculations to obtain qualitative, but not quantitative information 
about the factors stabilizing the connection between two subunits. 

The results from these calculations suggest that links in the terminal domains 
play an important role in stabilizing the complex. These links break last as 
subunits are pulled apart and thus represent the strongest interactions between 
the two subunits. The strongest interactions are all found in the C-terminal 
domain. Interactions in the TMH domain are weakest (cf. Tables 1,2). Only 
a limited number of experiments address the question of the requirements for 
formation of a light harvesting complex; most mutagenesis experiments are 
concerned with the formation of a a/5-subunit. In experimental studies that 
led to successful formation of a complete light harvesting complex, the termi- 
nal domains were taken from native sequences, thus offering little insight into 
what parts of the terminal domains are required for successful formation of 
complete light-harvesting complexes [24, 25, 30]. Recently, Braun et al. demon- 
strated formation of functional light harvesting complexes from polypeptides 
in which all amino acids in the TMH domain except for the ligating Hiso were 
replaced by alternating pairs of alanin and leucine residues [31]. An addition 
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of four amino acids in the C-terminal domain resulted in a complete loss of 
light harvesting complex formation. This study supports a crucial role the 
C-terminal domain in connecting subunits to form complete light harvesting 
complexes, whilst indicating a less important role of the TMH domain. 

So far, we have discussed the two dimensions distance and angle separately 
from each other. Interestingly, the differences between Rs. molischianum and 
Rps. acidophila suggest different rules by which distances and angles are stabi- 
lized. In Rs. molischianum, separation of two subunits appears to require more 
energy as the N-terminal domains stay connected far beyond distances where 
they are disconnected in Rps. acidophila. On the other hand, as discussed 
above, the angle is more constrained in Rps. acidophila. A proper discussion 
of the interplay between constraints in angle and distance, requires evaluation 
of the full two-dimensional PMF, which is beyond the scope of this publication. 

4.3 Outlook 

Our results suggest a guided key-lock principle in the assembly of light har- 
vesting complexes. Interactions in the terminal domains, in particular the 
C-terminal domain serve as 'hooks' that connect the subunits; however, they 
do not constrain the angle between the subunits very strongly, although in 
the case of Rps. acidophila, the C-terminus may assist in constraining the an- 
gle. Once the surfaces of the subunits, in particular the BChl co-factors, start 
to interlock in the hydrophobic core region, the angle between the subunits 
becomes well defined. 

To our knowledge, no previous theoretical study has attempted to address the 
question as to why some LH2 complexes form 8-rings, while most form 9-rings. 
It is therefore a very intriguing result that we found a distinct difference in 
the preferred angle between two subunits that closely matches the expected 
angle difference of 5° between an 8-ring in Rs. molischianum and a 9-ring in 
Rps. acidophila. 

This result is a promising starting point towards understanding the rules by 
which light harvesting complexes assemble into rings of defined sizes. Ob- 
viously, many questions remain to be answered, for example: Do the rules 
underlying assembly of two subunits remain the same when many subunits 
assemble? In other words, are there effects of different subunit concentrations, 
leading to phase transitions between different ring sizes (see e.g., [52])? What 
is the role of transcription factors and chaperones for assembly in vivol 

Genetic manipulation is a powerful tool to obtain information about the as- 
sembly of light harvesting complexes. Several experimental studies attempting 
to modify the size of an LH ring by truncating or swapping C-terminal do- 
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mains [15, 30] did not observe any alterations of the ring size, although changes 
in the hydrogen bonding could be observed. Olsen et al. speculate that steric 
constraints involving BChls, especially B800 BChls play a role in determining 
the ring size [30]. Our study supports this view as it suggests an important 
role of surface contacts in the hydrophobic region in constraining the angle be- 
tween two subunits. Most contacts in the transmembrane region are made by 
the BChls ligated to the a and (3 polypeptides. Thus, constraining the angle 
between two subunits requires positioning BChls into an orientation that in 
turn favors a particular angle between two subunits due to steric constraints. 
Amino acid substitutions will therefore only have an indirect effect on the 
ring size, by repositioning the (conserved) BChls into different orientations 
through changes in the binding pocket or ligation pattern. It is therefore by 
no means straightforward to predict the effects of amino acid substitutions on 
the ring size. 

Through homology modeling of light-harvesting complexes, one can, in princi- 
ple, emulate the genetic manipulation process. Based on such in silico mutants 
and using the calculation techniques presented here, one can then calculate the 
preferred interaction angles of complexes with amino acid deletions or substi- 
tutions. The goal of such calculations would be to predict new complexes for 
which, e.g. a 10-ring or 7-ring architecture is expected. Using established in 
vivo assembly experiments and AFM imaging techniques, one can test whether 
these de novo designs indeed form the expected ring sizes. 

A successful demonstration of the predictive power of molecular dynamics 
combined with non-equilibrium techniques could pave the way towards un- 
derstanding the principles underlying the assembly of multimeric membrane 
protein complexes. Such understanding would have relevance in controlling 
nanodevices built from photosynthetic materials. Furthermore, formation of 
multimeric protein complexes occurs in many membrane protein complexes. 
Combining in silico with experimental genetic manipulation could therefore 
yield information on the assembly not only of light harvesting complexes, but 
of membrane protein complexes in general. 
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